Search CORE

31 research outputs found

Concept-modulated model-based offline reinforcement learning for rapid generalization

Author: Ketz Nicholas A.
Pilly Praveen K.
Publication venue
Publication date: 07/09/2022
Field of study

The robustness of any machine learning solution is fundamentally bound by the data it was trained on. One way to generalize beyond the original training is through human-informed augmentation of the original dataset; however, it is impossible to specify all possible failure cases that can occur during deployment. To address this limitation we combine model-based reinforcement learning and model-interpretability methods to propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We demonstrate this method within a standard realistic driving simulator in a simple point-to-point navigation task, where we show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches

arXiv.org e-Print Archive

Context Meta-Reinforcement Learning via Neuromodulation

Author: Ben-Iwhiwhu Eseoghene
Dick Jeffery
Ketz Nicholas A.
Pilly Praveen K.
Soltoggio Andrea
Publication venue: 'Elsevier BV'
Publication date: 12/04/2022
Field of study

Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt quickly to tasks from few samples in dynamic environments. Such a feat is achieved through dynamic representations in an agent's policy network (obtained via reasoning about task context, model parameter updates, or both). However, obtaining rich dynamic representations for fast adaptation beyond simple benchmark problems is challenging due to the burden placed on the policy network to accommodate different policies. This paper addresses the challenge by introducing neuromodulation as a modular component to augment a standard policy network that regulates neuronal activities in order to produce efficient dynamic representations for task adaptation. The proposed extension to the policy network is evaluated across multiple discrete and continuous control environments of increasing complexity. To prove the generality and benefits of the extension in meta-RL, the neuromodulated network was applied to two state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates that meta-RL augmented with neuromodulation produces significantly better result and richer dynamic representations in comparison to the baselines

arXiv.org e-Print Archive

Loughborough University Institutional Repository

Sliced Cramer synaptic consolidation for preserving deeply learned representations

Author: Andrea Soltoggio (1248822)
Nicholas A Ketz (8612631)
Praveen K Pilly (8612634)
Soheil Kolouri (8612628)
Publication venue
Publication date: 14/03/2020
Field of study

Deep neural networks suffer from the inability to preserve the learned data representation (i.e., catastrophic forgetting) in domains where the input data distribution is non-stationary, and it changes during training. Various selective synaptic plasticity approaches have been recently proposed to preserve network parameters, which are crucial for previously learned tasks while learning new tasks. We explore such selective synaptic plasticity approaches through a unifying lens of memory replay and show the close relationship between methods like Elastic Weight Consolidation (EWC) and Memory-Aware-Synapses (MAS). We then propose a fundamentally different class of preservation methods that aim at preserving the distribution of the network’s output at an arbitrary layer for previous tasks while learning a new one. We propose the sliced Cramer distance as a suitable ´ choice for such preservation and evaluate our Sliced Cramer Preservation (SCP) ´ algorithm through extensive empirical investigations on various network architectures in both supervised and unsupervised learning settings. We show that SCP consistently utilizes the learning capacity of the network better than online-EWC and MAS methods on various incremental learning tasks

Loughborough University Institutional Repository

A-EMS: An Adaptive Emergency Management System for Autonomous Agents in Unforeseen Situations

Author: Ketz Nicholas
Maguire Glenn
Mouret Jean-Baptiste
Pilly Praveen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

International audienceReinforcement learning agents are unable to respond effectively when faced with novel, out-of-distribution events until they have undergone a significant period of additional training. For lifelong learning agents, which cannot be simply taken offline during this period, suboptimal actions may be taken that can result in unacceptable outcomes. This paper presents the Autonomous Emergency Management System (A-EMS)-an online, data-driven, emergency-response method that aims to provide autonomous agents the ability to react to unexpected situations that are very different from those it has been trained or designed to address. The proposed approach devises a customized response to the unforeseen situation sequentially, by selecting actions that minimize the rate of increase of the reconstruction error from a variational autoencoder. This optimization is achieved online in a data-efficient manner (on the order of 30 to 80 data-points) using a modified Bayesian optimization procedure. The potential of A-EMS is demonstrated through emergency situations devised in a simulated 3D car-driving application

INRIA a CCSD electronic archive server

Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture

Author: Ben-Iwhiwhu Eseoghene
Dick Jeffery
Hu Yang
Ketz Nicholas
Kolouri Soheil
Krichmar Jeffrey L.
Ladosz Pawel
Pilly Praveen
Soltoggio Andrea
Publication venue
Publication date: 24/09/2021
Field of study

This paper presents a new neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to solve difficult partially observable Markov decision process (POMDP) problems which impair temporal difference (TD)-based RL algorithms such as DQN, as the TD error cannot be easily derived from observations. The key idea is to use a Hebbian network with bio-inspired neural traces in order to bridge temporal delays between actions and rewards when confounding observations and sparse rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features and control, while the MOHN contributes to the high-level decisions by associating rewards with past states and actions. Thus the proposed architecture combines two modules with significantly different learning algorithms, a Hebbian associative network and a classical DQN pipeline, exploiting the advantages of both. Simulations on a set of POMDPs and on the MALMO environment show that the proposed algorithm improved DQN's results and even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms on some POMDPs with confounding stimuli and sparse rewards

arXiv.org e-Print Archive

Loughborough University Institutional Repository

ScholarWorks@UNIST

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.Comment: To appear in Neural Network

arXiv.org e-Print Archive

Loughborough University Institutional Repository

Dose-Dependent Effects of Closed-Loop tACS Delivered During Slow-Wave Oscillations on Memory Consolidation

Author: Aaron P. Jones
Aaron P. Jones
Angela Combs
Angela Combs
Bradley Robert
Bradley Robert
Charles S. H. Robinson
Charles S. H. Robinson
Charles S. H. Robinson
Hope A. Gill
Hope A. Gill
Jaehoon Choe
Melanie L. Lamphere
Melanie L. Lamphere
Melissa D. Heinrich
Melissa D. Heinrich
Michael D. Howard
Natalie B. Bryant
Natalie B. Bryant
Nicholas A. Ketz
Praveen K. Pilly
Steven W. Skorheim
Vincent P. Clark
Vincent P. Clark
Vincent P. Clark
Vincent P. Clark
Publication venue: 'Frontiers Media SA'
Publication date: 01/11/2018
Field of study

Sleep is critically important to consolidate information learned throughout the day. Slow-wave sleep (SWS) serves to consolidate declarative memories, a process previously modulated with open-loop non-invasive electrical stimulation, though not always effectively. These failures to replicate could be explained by the fact that stimulation has only been performed in open-loop, as opposed to closed-loop where phase and frequency of the endogenous slow-wave oscillations (SWOs) are matched for optimal timing. The current study investigated the effects of closed-loop transcranial Alternating Current Stimulation (tACS) targeting SWOs during sleep on memory consolidation. 21 participants took part in a three-night, counterbalanced, randomized, single-blind, within-subjects study, investigating performance changes (correct rate and F1 score) on images in a target detection task over 24 h. During sleep, 1.5 mA closed-loop tACS was delivered in phase over electrodes at F3 and F4 and 180° out of phase over electrodes at bilateral mastoids at the frequency (range 0.5–1.2 Hz) and phase of ongoing SWOs for a duration of 5 cycles in each discrete event throughout the night. Data were analyzed in a repeated measures ANOVA framework, and results show that verum stimulation improved post-sleep performance specifically on generalized versions of images used in training at both morning and afternoon tests compared to sham, suggesting the facilitation of schematization of information, but not of rote, veridical recall. We also found a surprising inverted U-shaped dose effect of sleep tACS, which is interpreted in terms of tACS-induced faciliatory and subsequent refractory dynamics of SWO power in scalp EEG. This is the first study showing a selective modulation of long-term memory generalization using a novel closed-loop tACS approach, which holds great potential for both healthy and neuropsychiatric populations

Directory of Open Access Journals

Recommended from our members

Functional Role of Neural Oscillations in Attentional Inhibition and Long Term Memory Retrieval

Author: Ketz Nicholas
Publication venue: CU Scholar
Publication date: 01/01/2016
Field of study

How does the brain selectively retrieve information from long term memory? What neural mechanisms are critical for this process, and how are these mechanisms brought into service in a task dependent way? What are the implications for the representations that are processed through these mechanisms, and can we use our understanding of them to better utilize encoding and retrieval of information in long term memory? These are some of the fundamental questions being addressed in this dissertation. Through the use of neural network models of the hippocampus and surrounding cortex this dissertation proposes a framework for understanding how time frequency signatures measured at the scalp can be used to track long term memory processes, and make quantitative predictions about how information in long term memory is altered by these processes. The fundamental thesis of this dissertation is that neural oscillations in the theta (3-8 Hz), alpha (8-12 Hz), and beta (12-30 Hz) frequency bands can be tied to specific functional mechanisms supporting long term memory, and that these oscillatory signatures can be tracked in human scalp EEG recordings to predict behavioral changes in the retrieval of items from memory. Specifically that oscillatory power in the theta band positively correlates with the how much information the hippocampus is reactivating for a given retrieval event, power in the alpha band positively correlates with how much information is being inhibited from being retrieved, and beta power negatively correlates with how much non-hippocampal dependent information is being retrieved. This thesis is supported by three behavioral experiments, two EEG experiments and two explorations with a computational neural network model of the hippocampus and surrounding cortex

CU Scholar Institutional Repository